logo

Introduction

In this computer lab, we will learn how to use automatic tools to forecast time series, including statistical models like exponential smoothing and ARIMA models, but now with emphasizing machine learning tools like random forests or neural networks.

In particular, we will learn how to manage the package modeltime

library(tidyverse) # great collection of packages to visualize and manage datasets  
library(lubridate) # to work with dates
library(fpp3)
library(tsibble) # tidy data structure for time series
library(tidymodels) # machine learning ecosystem of packages (the new caret)
library(modeltime) # automatic forecasting using machine learning
library(modeltime.ensemble) # ensembles   
library(timetk) # similar to tidyverse but for time series (visualization and management of times series)

Maximum temperatures in Madrid

Using data from AEMET, we will download daily maximum temperatures in Madrid, from 1950, and then agregate them into monthly maximum temperatures

As a summary, in Madrid the summers are hot, dry, and mostly clear and the winters are cold and partly cloudy.

Hence, the goal will be to forecast monthly maximum temperatures in Madrid

# Library to connect to the Spanish Meteorological Agency (AEMET) using its API 
library(climaemet)

aemet_stations()
## # A tibble: 291 × 7
##    indicativo indsinop nombre                 provincia altitud longitud latitud
##    <chr>      <chr>    <chr>                  <chr>       <dbl>    <dbl>   <dbl>
##  1 B013X      08304    ESCORCA, LLUC          ILLES BA…     490     2.89    39.8
##  2 B228       08301    PALMA, PUERTO          ILLES BA…       3     2.63    39.6
##  3 B248       08303    SIERRA DE ALFABIA, BU… ILLES BA…    1030     2.71    39.7
##  4 B278       08306    PALMA DE MALLORCA, AE… ILLES BA…       8     2.74    39.6
##  5 B346X      08310    PORRERES               ILLES BA…     120     3.02    39.5
##  6 B434X      08309    PORTOCOLOM             ILLES BA…      17     3.27    39.4
##  7 B569X      08311    CAPDEPERA              ILLES BA…      57     3.48    39.7
##  8 B691Y      08312    SA POBLA               ILLES BA…      40     3.02    39.7
##  9 B893       08314    MENORCA, AEROPUERTO    ILLES BA…      91     4.22    39.9
## 10 B954       08373    IBIZA, AEROPUERTO      ILLES BA…       6     1.38    38.9
## # ℹ 281 more rows
# select here a station
station <- "3195" # Madrid Retiro

temp_dat <-
  aemet_daily_clim(station, start = "1960-01-01", end = "2024-02-29")

#save(temp_dat, file="temperatures.RData")
#load("temperatures.RData")

# Aggregate to month
temp_month = temp_dat %>% mutate(year.month=yearmonth(fecha)) %>% group_by(year.month) %>%
  summarise(max=max(tmax, na.rm=T)) %>% arrange(year.month)

# Convert into ts
gtemp_ts = ts(temp_month$max, start=c(1960,1), frequency=12) %>% as_tsibble() %>% drop_na()

Seasonal plot

temp_month %>% mutate(month=as.factor(month(year.month, label=T, abbr=T))) %>%
  group_by(month) %>% mutate(medTemp = mean(max, na.rm = T)) %>%
  ggplot(aes(x=month, y=max, group=month, fill=medTemp, color=medTemp)) +
  geom_boxplot()+
    scale_fill_gradient(low="lightblue",high="orange")+
  scale_color_gradient(low="blue",high="red")+  
  labs(title="Maximum temperatures in Madrid (Retiro) per month of the year",
  subtitle="Data from 1960 to 2024", y = "Maximum temperature in ºC", x = "")+theme_minimal()+
    theme(plot.background = element_rect(fill='#212121'), text=element_text(size=14,color='#FFFFFF'),
        axis.text = element_text(color='#FFFFFF'), panel.grid.major.y = element_line(color = '#55565B', linetype = "dotted"),panel.grid.major.x = element_line(color = '#55565B', linetype = "dotted"),
        panel.grid.minor.y = element_blank(),panel.grid.minor.x = element_blank(),
        plot.title=element_text(size=20), legend.position="none")

The hottest month in Madrid was June/2019, August/2021, and July/2022, with 40.7 ºC

Machine-learning tools with modeltime

The modeltime package also considers tidy data, and build the time-series models using the tidymodels (new caret)

Hence, better to get some basic skills usig tidymodels: https://www.tidymodels.org/start/

Somehow, modeltime combines the library fable (for time series, no ML) with the library tidymodels (for machine learning, no time series)

With modeltime, we can train many models at the same time (arima, prophet, random forests, xgboost, neural networks, etc.)

Developed by Matt Dancho: https://business-science.github.io/modeltime/

# we need to change the format of the index date
gtemp_ts$index = as.Date(gtemp_ts$index)

gtemp_ts %>%
  plot_time_series(.date_var=index, .value=value, .interactive = TRUE,
                   .title='Maximum temperatures in Madrid center from 1950',
                   .y_lab = "Monthly data in °C") 

The maximum temperature has increased in about 2 degrees in the last 70 years

Seasonal plot with modeltime:

gtemp_ts %>%
    plot_seasonal_diagnostics(index, value, .interactive = T)

Time-series decomposition with modeltime:

gtemp_ts %>%
    plot_stl_diagnostics(index, value,
        .frequency = "auto", .trend = "auto",
        .feature_set = c("observed", "season", "trend", "remainder"),
        .interactive = T)

In an easy way, we can split the time series into training and testing sets (using the timetk library)

Train-test split (just once)

Last 12-months of data as the testing set, the other 20 years is the training

# forecasting horizon = 1 year
# training window = 20 years

splits <- gtemp_ts %>%
  time_series_split(date_var = index, initial = "20 years", assess  = "1 year")

# visualize the split
splits %>%
  tk_time_series_cv_plan() %>%
  plot_time_series_cv_plan(index, value, .interactive = FALSE)

Time-series cross-validation

Let’s try a 5-fold time-series cross-validation

In real time-series, because they are non-stationary, it is usually better to consider a fixed training window than an expanding/sliding one

# forecasting horizon = 1 year
# training window = 20 years, always fixed (cumulative=F)
# forecast every year

splits <- time_series_cv(data = filter(gtemp_ts), 
                         date_var = index,
                         initial     = "15 years", # window for train set
                         assess      = "1 year", # h = 12 months
                         skip        = "12 months", # forecast every year
                         slice_limit = 5, # maximum number of blocks/slices
                         cumulative  = FALSE)

# visualize the splits
splits %>%
    plot_time_series_cv_plan(index, value, .interactive = FALSE)

Note how the train set contains always 15 years while the test set contains 1 year

Train

Train the models in an automatic way: first an auto.arima, then with prophet

We are going to select the first split (forecast the Mar-2023 to Feb-2024 using the previous 15 years), but we can select any other

Train three basic models, no ML: auto.arima, ets, and prophet

sp = 1 # select here the split

# First the auto.arima of Hyndman
model_fit_arima = 
  arima_reg() %>% 
  # arima_reg(seasonal_period = 12, seasonal_differences = 1, non_seasonal_differences = 1) %>%
  set_engine("auto_arima") %>%
  fit(value ~ index, training(splits$splits[[sp]])) 
# modeltime models require a date column to be a regressor

model_fit_arima
## parsnip model object
## 
## Series: outcome 
## ARIMA(0,0,2)(1,1,2)[12] 
## 
## Coefficients:
##          ma1     ma2    sar1     sma1   sma2
##       0.0133  0.1375  0.1355  -1.1965  0.276
## s.e.  0.0803  0.0801  0.4427   0.4114  0.410
## 
## sigma^2 = 4.181:  log likelihood = -367.54
## AIC=747.08   AICc=747.6   BIC=765.82
# Error-Trend-Season (ETS) model
model_fit_ets = 
  exp_smoothing() %>% 
  set_engine("ets") %>%
  fit(value ~ index, training(splits$splits[[sp]])) 

model_fit_ets
## parsnip model object
## 
## ETS(M,N,A) 
## 
## Call:
##  forecast::ets(y = outcome, model = model_ets, damped = damping_ets,  
## 
##  Call:
##      alpha = alpha, beta = beta, gamma = gamma) 
## 
##   Smoothing parameters:
##     alpha = 1e-04 
##     gamma = 1e-04 
## 
##   Initial states:
##     l = 26.8153 
##     s = -8.4407 -11.1048 -11.5541 -7.3178 -0.0793 6.3929
##            11.4142 10.9763 9.9194 4.4873 -0.6481 -4.0454
## 
##   sigma:  0.0788
## 
##      AIC     AICc      BIC 
## 1201.547 1204.474 1249.441
# Now the prophet by facebook
model_fit_prophet <- prophet_reg() %>%
  set_engine("prophet", monthly.seasonality = TRUE) %>%
  fit(value ~ index, training(splits$splits[[sp]]))
model_fit_prophet
## parsnip model object
## 
## PROPHET Model
## - growth: 'linear'
## - n.changepoints: 25
## - changepoint.range: 0.8
## - yearly.seasonality: 'auto'
## - weekly.seasonality: 'auto'
## - daily.seasonality: 'auto'
## - seasonality.mode: 'additive'
## - changepoint.prior.scale: 0.05
## - seasonality.prior.scale: 10
## - holidays.prior.scale: 10
## - logistic_cap: NULL
## - logistic_floor: NULL
## - extra_regressors: 0

Machine Learning for time series

ML tools in time series are more difficult to deal with because we need to create first the relevant features

Pre-processing

We are going to create the features using recipe objects (with steps)

See more details in https://www.tidymodels.org/

The recipe:

recipe_spec <- recipe(value ~ index, training(splits$splits[[sp]])) %>%
  step_timeseries_signature(index) %>%
  step_rm(contains("am.pm"), contains("hour"), contains("minute"),contains("day"),
          contains("week"), contains("second"), contains("xts"), contains("iso"), index_month, index_quarter, index_half, index_index.num) %>%
  step_dummy(all_nominal()) %>%
  step_fourier(index, K = 2, period = 12) 
# step_rm is to remove features we are not going to use

# Just to see the data frame with the recipe
recipe_spec %>% prep() %>% juice() %>% View()

By default, many features are created automatically. Unnecessary features can be removed using recipes::step_rm()

Train

We can fit any model using different computational engines

See more details in https://www.tidymodels.org/

Let’s try glmnet, random forest, XGBoost, neural netwotks, and prophet

For each model, we need to define a workflow: container that aggregates information required to fit and predict (model+features+train)

# glmnet
model_spec_glmnet <- linear_reg(penalty = 0.01, mixture = 0.5) %>%
  set_engine("glmnet")

# Hyper-parameters can be optimized in this way: https://business-science.github.io/modeltime/articles/parallel-processing.html

workflow_fit_glmnet <- workflow() %>%
  add_model(model_spec_glmnet) %>%
  add_recipe(recipe_spec %>% step_rm(index)) %>%
  fit(training(splits$splits[[sp]]))

# randomForest
model_spec_rf <- rand_forest(trees = 500, mode = "regression") %>%
  set_engine("randomForest")

workflow_fit_rf <- workflow() %>%
  add_model(model_spec_rf) %>%
  add_recipe(recipe_spec %>% step_rm(index)) %>%
  fit(training(splits$splits[[sp]]))

# XGBoost
model_spec_xgboost <- boost_tree(mode = "regression") %>%
    set_engine("xgboost")

wflw_fit_xgboost <- workflow() %>%
    add_model(model_spec_xgboost) %>%
    add_recipe(recipe_spec %>% step_rm(index)) %>%
    fit(training(splits$splits[[sp]]))

# NNETAR
model_spec_nnetar <- nnetar_reg(seasonal_period = 12, mode = "regression") %>%
    set_engine("nnetar")

wflw_fit_nnetar <- workflow() %>%
    add_model(model_spec_nnetar) %>%
    add_recipe(recipe_spec) %>%
    fit(training(splits$splits[[sp]]))

# Prophet with all the features

model_spec_prophet <- prophet_reg(
      seasonality_yearly = TRUE
    ) %>%
    set_engine("prophet") 

wflw_fit_prophet <- workflow() %>%
    add_model(model_spec_prophet) %>%
    add_recipe(recipe_spec) %>%
    fit(training(splits$splits[[sp]]))

The modeltime table organizes the models with IDs and creates generic descriptions to help us keep track of our models

model_table <- modeltime_table(
  model_fit_arima, 
  model_fit_ets, 
  model_fit_prophet,
  workflow_fit_glmnet,
  workflow_fit_rf,
  wflw_fit_xgboost,
  wflw_fit_nnetar,
  wflw_fit_prophet
) 
model_table
## # Modeltime Table
## # A tibble: 8 × 3
##   .model_id .model     .model_desc            
##       <int> <list>     <chr>                  
## 1         1 <fit[+]>   ARIMA(0,0,2)(1,1,2)[12]
## 2         2 <fit[+]>   ETS(M,N,A)             
## 3         3 <fit[+]>   PROPHET                
## 4         4 <workflow> GLMNET                 
## 5         5 <workflow> RANDOMFOREST           
## 6         6 <workflow> XGBOOST                
## 7         7 <workflow> NNAR(1,1,10)[12]       
## 8         8 <workflow> PROPHET W/ REGRESSORS

Forecasting

First, model calibration is used to quantify error and estimate confidence intervals

Model calibration will be performed on the out-of-sample data (testing sets) with the modeltime_calibrate() function

calibration_table <- model_table %>%
  modeltime_calibrate(new_data = testing(splits$splits[[sp]]))
calibration_table
## # Modeltime Table
## # A tibble: 8 × 5
##   .model_id .model     .model_desc             .type .calibration_data
##       <int> <list>     <chr>                   <chr> <list>           
## 1         1 <fit[+]>   ARIMA(0,0,2)(1,1,2)[12] Test  <tibble [12 × 4]>
## 2         2 <fit[+]>   ETS(M,N,A)              Test  <tibble [12 × 4]>
## 3         3 <fit[+]>   PROPHET                 Test  <tibble [12 × 4]>
## 4         4 <workflow> GLMNET                  Test  <tibble [12 × 4]>
## 5         5 <workflow> RANDOMFOREST            Test  <tibble [12 × 4]>
## 6         6 <workflow> XGBOOST                 Test  <tibble [12 × 4]>
## 7         7 <workflow> NNAR(1,1,10)[12]        Test  <tibble [12 × 4]>
## 8         8 <workflow> PROPHET W/ REGRESSORS   Test  <tibble [12 × 4]>

Now, with calibrated data, we can forecast all the models

calibration_table %>%
  modeltime_forecast(actual_data = filter(gtemp_ts,year(index)>=2018)) %>%
  plot_modeltime_forecast(.interactive = FALSE) + labs(title = "Forecasts",    y = "",    caption = "modeltime"      )

Accuracy table in the first testing set

calibration_table %>%
  modeltime_accuracy() %>%
  table_modeltime_accuracy(.interactive = FALSE)
Accuracy Table
.model_id .model_desc .type mae mape mase smape rmse rsq
1 ARIMA(0,0,2)(1,1,2)[12] Test 1.98 8.04 0.43 8.19 2.34 0.93
2 ETS(M,N,A) Test 1.99 8.03 0.43 8.22 2.35 0.93
3 PROPHET Test 2.17 9.06 0.47 9.12 2.46 0.92
4 GLMNET Test 1.93 7.96 0.42 8.03 2.37 0.92
5 RANDOMFOREST Test 2.37 10.05 0.51 10.10 2.89 0.89
6 XGBOOST Test 2.72 10.78 0.59 11.37 3.38 0.87
7 NNAR(1,1,10)[12] Test 3.16 12.90 0.68 13.88 4.16 0.81
8 PROPHET W/ REGRESSORS Test 2.09 8.67 0.45 8.83 2.48 0.92

Insights?

Re-train models for accuracy resample

We should run again the previous workflow for the other splits

To do that, let’s resample the accuracy of a model (in our case using the first split) with the other slices (testing sets)

Resample forecasts: for each slice (training set), the model is re-fitted and forecasts (testing set) are provided. That is, we take the model’s specification from the first split (just the model type) and refit (re-train) it repeatedly to resampled data (the rest of the splits)

resample_results <- model_table %>%
  modeltime_fit_resamples(
    resamples = splits,
    control   = control_resamples(verbose = TRUE)
  )
## ── Fitting Resamples ────────────────────────────────────────────
## 
## 26.513 sec elapsed
resample_results
## # Modeltime Table
## # A tibble: 8 × 4
##   .model_id .model     .model_desc             .resample_results
##       <int> <list>     <chr>                   <list>           
## 1         1 <fit[+]>   ARIMA(0,0,2)(1,1,2)[12] <rsmp[+]>        
## 2         2 <fit[+]>   ETS(M,N,A)              <rsmp[+]>        
## 3         3 <fit[+]>   PROPHET                 <rsmp[+]>        
## 4         4 <workflow> GLMNET                  <rsmp[+]>        
## 5         5 <workflow> RANDOMFOREST            <rsmp[+]>        
## 6         6 <workflow> XGBOOST                 <rsmp[+]>        
## 7         7 <workflow> NNAR(1,1,10)[12]        <rsmp[+]>        
## 8         8 <workflow> PROPHET W/ REGRESSORS   <rsmp[+]>

Visualize the accuracy for each model and each slice

resample_results %>%
  plot_modeltime_resamples(
    .summary_fn  = mean, 
    .point_size  = 3,
    .interactive = FALSE
  )

The average performance:

resample_results %>%
  modeltime_resample_accuracy(summary_fns = list(mean = mean)) %>%
  table_modeltime_accuracy(.interactive = FALSE)
Accuracy Table
.model_id .model_desc .type n mae_mean mape_mean mase_mean smape_mean rmse_mean rsq_mean
1 ARIMA(0,0,2)(1,1,2)[12] Resamples 5 1.68 6.68 0.38 6.57 2.13 0.95
2 ETS(M,N,A) Resamples 5 1.55 6.14 0.35 6.10 1.98 0.95
3 PROPHET Resamples 5 1.57 6.18 0.36 6.06 2.01 0.95
4 GLMNET Resamples 5 1.56 6.26 0.35 6.13 2.00 0.95
5 RANDOMFOREST Resamples 5 1.90 7.79 0.43 7.64 2.31 0.94
6 XGBOOST Resamples 5 2.05 8.14 0.46 8.23 2.49 0.93
7 NNAR(1,1,10)[12] Resamples 5 2.52 9.95 0.57 10.06 3.29 0.88
8 PROPHET W/ REGRESSORS Resamples 5 1.60 6.29 0.36 6.21 2.02 0.95

Which are the more robust tools across different splits?

That means Arima works well but needs to be re-trained each time. On the other hand, ML tools, specially prophet and rf, do not need to be re-trained as often

Ensembles

Finally, let’s combine the best models using the simple average ensemble to forecast 2024

Besides testing data (last available 12 months), we are going to forecast 12 more months (completely not known). I.e. our final forecasting horizon will be \(h=24\)

future.data = testing(splits$splits[[sp]]) %>% future_frame(index, .length_out = "1 year", .bind_data = T)

# calibrate again to obtain longer forecasts
calibration_table <- model_table %>% modeltime_calibrate(future.data)

# Select here your favourite models
iaux = c(1, 2, 3, 5) 

# make the ensemble
ensemble_fit_avg <- calibration_table[iaux,] %>%
    ensemble_average(type = "mean")
ensemble_fit_avg
## ── Modeltime Ensemble ───────────────────────────────────────────
## Ensemble of 4 Models (MEAN)
## 
## # Modeltime Table
## # A tibble: 4 × 5
##   .model_id .model     .model_desc             .type .calibration_data
##       <int> <list>     <chr>                   <chr> <list>           
## 1         1 <fit[+]>   ARIMA(0,0,2)(1,1,2)[12] Test  <tibble [24 × 4]>
## 2         2 <fit[+]>   ETS(M,N,A)              Test  <tibble [24 × 4]>
## 3         3 <fit[+]>   PROPHET                 Test  <tibble [24 × 4]>
## 4         5 <workflow> RANDOMFOREST            Test  <tibble [24 × 4]>
# calibration and performance
ensemble_fit_avg %>%
  modeltime_calibrate(testing(splits$splits[[sp]])) %>%
  modeltime_accuracy() %>%
  table_modeltime_accuracy(.interactive = FALSE)
Accuracy Table
.model_id .model_desc .type mae mape mase smape rmse rsq
1 ENSEMBLE (MEAN): 4 MODELS Test 2.08 8.61 0.45 8.73 2.47 0.92
ensemble_fit_avg %>% modeltime_table() %>%
  modeltime_forecast(actual_data = filter(gtemp_ts,year(index)>=2018)) %>%
  plot_modeltime_forecast(.interactive = FALSE) + labs(title = "24-months ahead forecasts",    y = "",    caption = "modeltime"      ) +
  scale_x_date(breaks = scales::date_breaks("1 year"),
               labels = scales::date_format("%Y"))

Seems good performance in 2023, and similar pattern in 2024

During the 2023 summer, max temperatures higher than expected. And at the very end, max temperatures lower than expected.